[ty] replace `FxHash{Map, Set}` with footgun-mitigated APIs #21686

mtshiba · 2025-11-29T06:35:34Z

Summary

If a data structure that depends on salsa IDs is used as the key for an FxHashMap or the value of an FxHashSet, the output order of the iterator will be unstable. If a query depends on this order, the results of fixed-point iteration will be unstable.
In this case, Fx{Index, Order}{Map, Set} should be used, but it seems that this is not being done thoroughly.

Simply replacing all FxHash{Map, Set} with Fx{Index, Order}{Map, Set} would solve the problem, but it is generally believed that the former has slightly better performance if we are only using insertion and retrieval without using a set/map as an iterator.

~~Therefore, this PR proposes a compromise.~~
That is, replace all FxHash{Map, Set} used within ty with wrapper structs that does not implement (Into)Iterator, and instead define methods like unstable_iter to make users of these structs aware of whether iteration operations are safe.

After performing this refactoring, I discovered some suspicious parts. I hope this fix will help to resolve the issue.

Test Plan

astral-sh-bot · 2025-11-29T06:37:56Z

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

astral-sh-bot · 2025-11-29T06:39:26Z

`mypy_primer` results

Changes were detected when running on open source projects

scikit-build-core (https://github.com/scikit-build/scikit-build-core)
+ src/scikit_build_core/_logging.py:153:13: warning[unsupported-base] Unsupported class base with type `<class 'Mapping[str, Style]'> | <class 'Mapping[str, Divergent]'>`
- Found 41 diagnostics
+ Found 42 diagnostics

pandas-stubs (https://github.com/pandas-dev/pandas-stubs)
+ pandas-stubs/_typing.pyi:1209:16: warning[unused-ignore-comment] Unused blanket `type: ignore` directive
- Found 5511 diagnostics
+ Found 5512 diagnostics

No memory usage changes detected ✅

astral-sh-bot · 2025-11-29T06:45:08Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

crates/ty_python_semantic/src/types/ide_support.rs

crates/ty_python_semantic/src/types/overrides.rs

mtshiba · 2025-11-29T08:50:38Z

crates/ty_python_semantic/src/types/constraints.rs


-        let mut typevars = FxHashSet::default();
+        // We should use `FxIndexSet` here since `BoundTypeVarInstance::{valid, required}_specializations` is query-dependent.
+        let mut typevars = FxIndexSet::default();


We should probably use FxIndexSet here.
The rest of this module seems okay to use unstable iterators, unless I'm overlooking something.

Can you say more what query-dependent means?

Looking at the loop below, it doesn't seem to depend on ordering as it returns true only if all typevars satisfy the constraints and there's no state between the type var checking, as far as I can tell

{valid, required}_specialization uses lazy_{bound, constraints} internally. I meant that the order in which these queries are called is non-deterministic.

crates/ty_project/src/files.rs

crates/ty_python_semantic/src/types/generics.rs

mtshiba · 2025-11-29T11:38:06Z

Unfortunately, it appears that non-determinism remains due to factors entirely different from those considered in this PR.
However, that doesn't mean I think these changes are useless (they appear to improve performance slightly in benchmarks of large projects).

Anyway, I'm marking this PR as ready for review.

MichaReiser · 2025-11-29T18:11:08Z

I like the approach, but I don't think using an IndexMap is the solution.

Iterating over two HashMaps, each created by inserting the same elements in the same order, will yield the same iteration order when run on the same platform.

The issue we see with fix point is that it's no longer guaranteed that the elements are inserted in the same order. However, we'll have the exact same issue when using IndexMap because IndexMap's iteration order is defined by insertion order, which isn't guaranteed to be deterministic within a cyclic query.

I'm not sure what the solution here is, other than applying some sort of sorting (somewhere?).

MichaReiser · 2025-11-30T16:29:06Z

There's also one flaky diagnostic. What I suspect is that our convergence functions are now sensitive to which query is the outer-most cycle or some query that bails early as soon as it sees the first Divergent type (any type), and the any type is only visible depending on the cycle nesting

MichaReiser

I haven't reviewed all the changes but I don't think the StableKey assumption is safe.

I do think it makes sense to have a hash map wrapper and this is also what rustc does https://github.com/rust-lang/rust/blob/ae90dcf0207c57c3034f00b07048d63f8b2363c8/compiler/rustc_data_structures/src/stable_map.rs#L45

Another solution (maybe less invasive?) could be to initialize the hashser function with a random state (instead of 0) in debug builds, so that iteration order is guaranteed to be different across runs (or have a feature flag that we can turn on)

crates/ruff_benchmark/benches/ty.rs

crates/ty_project/src/db/changes.rs

crates/ty_project/src/files.rs

crates/ty_project/src/lib.rs

crates/ty_python_semantic/src/module_resolver/typeshed.rs

crates/ty_python_semantic/src/semantic_index/place.rs

crates/ty_python_semantic/src/lib.rs

Gankra · 2025-12-01T14:39:46Z

re: randomizing layouts: if you want to go down that route I suggest cribbing from rustc's -Zrandomize-layout which lets you pass the seed in as a CLI argument (and print the seed any time one is selected so people can try to reproduce an issue).

MichaReiser

I think I'm leaning towards changing the hash maps in separate PRs and more explicitly talk about why the changes are necessary. I'm not convinced that it's necessary to use FxIndexMap in many cases.

crates/ruff_benchmark/benches/ty.rs

crates/ty_project/src/db/changes.rs

crates/ty_project/src/files.rs

crates/ty_python_semantic/src/hash.rs

crates/ty_python_semantic/src/lint.rs

MichaReiser · 2025-12-02T08:03:48Z

crates/ty_python_semantic/src/types/constraints.rs


-        let mut typevars = FxHashSet::default();
+        // We should use `FxIndexSet` here since `BoundTypeVarInstance::{valid, required}_specializations` is query-dependent.
+        let mut typevars = FxIndexSet::default();


Can you say more what query-dependent means?

Looking at the loop below, it doesn't seem to depend on ordering as it returns true only if all typevars satisfy the constraints and there's no state between the type var checking, as far as I can tell

MichaReiser · 2025-12-02T08:05:28Z

crates/ty_python_semantic/src/types/ide_support.rs

 /// List all members of a given type: anything that would be valid when accessed
 /// as an attribute on an object of the given type.
-pub fn all_members<'db>(db: &'db dyn Db, ty: Type<'db>) -> FxHashSet<Member<'db>> {
+pub fn all_members<'db>(db: &'db dyn Db, ty: Type<'db>) -> FxIndexSet<Member<'db>> {


Are we using this anywhere outside the LSP?

It seems to be used in ty_python_semantic as well (https://github.com/mtshiba/ruff/blob/stable-iteration/crates/ty_python_semantic/src/types/function.rs#L1464-L1467). In that case, however, an unstable iterator is not a problem.

that is just for internal use so that we can test the function itself in our mdtests; it's not a public-facing part of ty_python_semantic

… HashSet}`

mtshiba · 2025-12-03T18:04:45Z

I think I'm leaning towards changing the hash maps in separate PRs and more explicitly talk about why the changes are necessary. I'm not convinced that it's necessary to use FxIndexMap in many cases.

I changed this PR to just replace FxHash{Map, Set} with new APIs. Therefore, this PR does not change any of the behavior of ty. Changes that affect behavior will be made after this PR (with reconsideration).

MichaReiser

Thanks for updating.

I'd be curious to get some more opinions on this. To me, this PR goes from one extreme to the other. I don't think we need to enforce this new API in all crates. E.g. I think it's completely fine to use the normal FxHashMap and IntoIterator in the project file discovery.

I'm leaning towards:

Giving those types a different name instead of overriding FxHashSet. I'm not entirely sure what to name them
Update our Contribution guidelines to state that these types should be used in ty_python_semantic with an explanation why it's important
Add the types to ruff_db

But maybe it's just me who dislikes the extra verbosity and others actually prefer to override FxHashSet and FxHashMap

mtshiba · 2025-12-04T09:46:48Z

The purpose of this PR is to make developers aware that using FxHash{Map, Set} as an iterator is inherently unstable, and in a sense, it intentionally introduces noisy APIs.
However, if you feel it's troublesome to continue using unstable_iter() in places where it is clearly OK to use it as an iterator, one option is to give FxHash{Map, Set} a tag type and change its behavior depending on the tag. In this case, there is no need to prepare two types of structs.

use std::collections::HashMap;
use std::collections::hash_map::IntoIter;

/// You are sure you can always use this hash map/set as an iterator.
#[derive(Default)]
pub struct ImplIntoIterator;
#[derive(Default)]
pub struct NoIntoIterator;

#[derive(Default)]
pub struct FxHashMap<K, V, Tag=NoIntoIterator>(HashMap<K, V>, std::marker::PhantomData<Tag>);

impl<K, V> IntoIterator for FxHashMap<K, V, ImplIntoIterator> {
    type Item = (K, V);
    type IntoIter = IntoIter<K, V>;

    fn into_iter(self) -> IntoIter<K, V> {
        self.0.into_iter()
    }
}

fn main() {
    let map1: FxHashMap<u32, u32> = FxHashMap::default();
    let map2: FxHashMap<u32, u32, ImplIntoIterator> = FxHashMap::default();
    
    for _i in map1 {} // ERR
    for _i in map2 {} // OK
}

Using FxHashMap<K, V, ImplIntoIterator> means that you are declaring that it can always be used as an iterator and there is no problem with that, and using FxHashMap<K, V, NoIntoIterator> means that you need to make sure it is OK to use before each iteration (if it is safe you can use unstable_iter()).

[ty] do not use an iterator of FxHash{Map, Set} as query input

14725d3

mtshiba added 2 commits November 29, 2025 15:42

minor fixes

fadc97c

cargo shear

56a9cf4

add StableKey

a1eb5bd

mtshiba force-pushed the stable-iteration branch from 006a794 to a1eb5bd Compare November 29, 2025 08:34

Update constraints.rs

1f77ba9

mtshiba commented Nov 29, 2025

View reviewed changes

mtshiba closed this Nov 29, 2025

mtshiba reopened this Nov 29, 2025

mtshiba added 3 commits November 29, 2025 18:47

InferableTypeVars should use FxIndexSet

aca1841

Update constraints.rs

2cc4178

ty_project::files::IndexedFiles should use FxIndexSet

e844245

mtshiba commented Nov 29, 2025

View reviewed changes

crates/ty_project/src/files.rs Outdated Show resolved Hide resolved

mtshiba commented Nov 29, 2025

View reviewed changes

crates/ty_python_semantic/src/types/generics.rs Outdated Show resolved Hide resolved

mtshiba closed this Nov 29, 2025

mtshiba reopened this Nov 29, 2025

AlexWaygood added the ty Multi-file analysis & type inference label Nov 29, 2025

mtshiba marked this pull request as ready for review November 29, 2025 11:38

mtshiba requested review from AlexWaygood, MichaReiser, carljm, dcreager and sharkdp as code owners November 29, 2025 11:38

MichaReiser reviewed Dec 1, 2025

View reviewed changes

mtshiba added 3 commits December 2, 2025 02:02

remove StableKey APIs

a8b2206

create hash.rs and move FxHash{Map, Set} into it

3dc82b8

revert: ty_project::Project: FxIndexSet -> FxHashSet

cf451d8

mtshiba force-pushed the stable-iteration branch from 1f605d3 to 1966a1c Compare December 2, 2025 03:58

use BTreeMap for ty_project::files::Indexed

47570b9

mtshiba force-pushed the stable-iteration branch from 1966a1c to 47570b9 Compare December 2, 2025 04:00

mtshiba added 3 commits December 2, 2025 13:01

Merge branch 'main' into stable-iteration

9d4ec18

wrap used APIs

5641711

cargo shear

6cee18d

MichaReiser requested changes Dec 2, 2025

View reviewed changes

sharkdp removed their request for review December 2, 2025 08:20

carljm removed their request for review December 3, 2025 06:46

mtshiba added 4 commits December 4, 2025 01:16

make trait bounds faithfully the same as `std::collections::{HashMap,…

55bec3a

… HashSet}`

revert using BTreeSet for files

19f2daf

don't implement IntoIterator for ProjectFiles, Indexed

65c04e4

revert using FxIndexSet for unstable iteration parts

64267bf

mtshiba force-pushed the stable-iteration branch from 0b6f880 to 64267bf Compare December 3, 2025 17:45

Merge branch 'main' into stable-iteration

c3d55d6

mtshiba changed the title ~~[ty] disallow using unstable iterators of FxHash{Map, Set} as query input~~ [ty] replace FxHash{Map, Set} with footgun-mitigated APIs Dec 3, 2025

mtshiba requested a review from MichaReiser December 3, 2025 18:08

MichaReiser reviewed Dec 4, 2025

View reviewed changes

[ty] replace FxHash{Map, Set} with footgun-mitigated APIs #21686

Are you sure you want to change the base?

[ty] replace FxHash{Map, Set} with footgun-mitigated APIs #21686

Conversation

mtshiba commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

astral-sh-bot bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagnostic diff on typing conformance tests

Uh oh!

astral-sh-bot bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

astral-sh-bot bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

Uh oh!

Uh oh!

mtshiba Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

MichaReiser Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mtshiba Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mtshiba commented Nov 29, 2025

Uh oh!

MichaReiser commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser commented Nov 30, 2025

Uh oh!

MichaReiser left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gankra commented Dec 1, 2025

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

MichaReiser Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mtshiba Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

AlexWaygood Dec 3, 2025

Choose a reason for hiding this comment

[ty] replace `FxHash{Map, Set}` with footgun-mitigated APIs #21686

[ty] replace `FxHash{Map, Set}` with footgun-mitigated APIs #21686

mtshiba commented Nov 29, 2025 •

edited

Loading

astral-sh-bot bot commented Nov 29, 2025 •

edited

Loading

astral-sh-bot bot commented Nov 29, 2025 •

edited

Loading

`mypy_primer` results

astral-sh-bot bot commented Nov 29, 2025 •

edited

Loading

`ruff-ecosystem` results

mtshiba Dec 3, 2025 •

edited

Loading

MichaReiser commented Nov 29, 2025 •

edited

Loading

MichaReiser left a comment •

edited

Loading

mtshiba commented Dec 4, 2025 •

edited

Loading